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INTRODUCTION 


1  . 

The  purpose  of  this  paper  is  to  present  several  FORTRAN 
subroutines  for  updating  the  QR  decomposition  of  a  matrix.  Let 
A  G  R””*"  ,  m  >  n,  have  a  QR  decomposition  A  =  QR,  where  Q  £  j^mxn 
orthonormal  columns,  and  R  6  pnxn  upper  triangular.  Assume  that 
the  elements  of  Q  and  R  are  explicitly  known.  Let  A  €  R’’**’ ,  p  >  q, 
be  obtained  from  A  by  inserting  or  deleting  a  row  or  a  column,  or  let 
A  be  a  rank-one  modification  of  A,  i.e.,  A  =  A  +  vu^ ,  where  u  G  R" , 

V  G  R"’ .  Then  a  QR-decompos  it  ion  of  A,  A  =  QR,  where  Q  G  R*’*'*  has 
orthonormal  columns  and  R  gR^*'^  is  upper  triangular,  can  be  computed 
in  □(mn)  arithmetic  operations  by  updating  Q  and  R;  see  Daniel  et  al . 
[5] .  The  updating  is  done  by  applying  Givens  reflectors.  The 
operation  count  for  updating  Q  and  R  compares  favorably  with  the 
O(mn^)  arithmetic  operations  nece.ssary  to  compute  a  QR  decomposition 
of  a  general  mxn  matrix. 

Algol  procedures  for  computing  Q  and  R  from  Q  and  R  are  presented 
by  Daniel  et  al .  [5] .  Buckley  [2]  translated  these  procedures  into 

FORTRAN.  Our  FORTRAN  subroutines  implement  modifications  of  the 
Algol  procedures  in  [5] .  These  modifications  speed  up  the 
subroutines  and  make  them  suitable  for  use  on  vector  computers.  This 
is  illustrated  by  timing  experiments. 

Several  program  libraries,  such  as  LINPACK  [6]  and  NAG  [14]  , 
provide  subroutines  for  updating  R  only,  but  contain  no  routines  for 
updating  the  complete  QR  decomposition.  Advantages  of  updating  both 
Q  and  R  include  that  downdating  can  be  carried  out  stably,  and  that 
the  individual  elements  of  projections  are  easily  accessible:  see 
LINPACK  [6,  p.  10.23] ,  Daniel  et  al .  [5] ,  and  Stewart  [17] . 


The  "first,  comprehensive  survey  of  updating  algorithms  was 
presented  by  Gill  et  al .  [8] ,  and  a  recent  discussion  with  references 

to  applications  can  be  found  in  Golub  and  Van  Loan  [10,  Chapter 
12.6].  The  applications  include  linear  least  squares  problems, 
regression  analysis,  and  the  solution  o"f  nonlinear  systems  of 
equations.  See  Allen  [1] ,  Goldfarb  [9] ,  Gragg  and  Stewart  [11] ,  More 
and  Sorensen  [13] .  The  algorithms  would  also  appear  to  be  applicable 
to  recursive  least  squares  problems  of  signal  processing;  see  Ling  et 
al  .  [12]  . 

We  also  present  a  subroutine  which  implements  the  rank  revealing 
QR  decomposition  method  recently  proposed  by  Chan  [3]  .  In  this 
method  the  QR  decomposition  A  =  QR  is  updated  to  yield  the  QR 
decomposition  A  =  QR,  where  A  is  obtained  from  A  by  column 
permutation.  This  permutation  is  selected  so  that,  in  general,  the 
element(s)  in  the  lower  right  corner  of  R  are  small  if  A  has  nearly 
linearly  dependent  columns.  The  subroutine  can  be  used  to  solve  the 
subset  selection  problem:  see  Golub  and  Van  Loan  [10].  Table  1.1 
lists  the  FORTRAN  subroutines  for  updating  the  QR  decomposition.  All 
subroutines  use  double  precision  arithmetic  and  are  written  in 
FORTRAN  77.  Section  2  contains  programming  details  for  the 
subroutines  of  Table  1.1  and  for  certain  auxiliary  subprograms.  For 
ail  subroutines  of  Table  1.1,  except  DRRPM ,  the  numerical  method  as 
well  as  Algol  procedures  have  been  presented  in  [5] .  For  these 
subroutines — we  will  only  discuss  differences  between  our  FORTRAN 
subroutines  and  the  Algol  procedures.  These  differences  stem  in  part 
from  the  algorithms  being  sped  up,  as  well  as  from  the  use  of  simple 
subroutines,  BLAS ,  for  elementary  \cctoi  and  matrix  operations. 
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Subrout i ne 
DDELC 

DDELR 

DINSC 

DINSR 

DRNKl 

DRRPM 


Purpose 


Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 
deleting  a  column;  see  [5] . 

Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 
deleting  a  row';  see  [.5]  • 

Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 
inserting  a  column;  see  [5] . 


Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 
inserting  a  row;  see  [5] . 


Computes  Q,R  from  Q,R  when  A  is  a  rank-one 
modification  of  A;  see  [5]  . 


Computes  Q,R  from  Q,R  when  A  is 
permuting  the  columns  of  A  in  a 
generally  reveals  if  columns  of 
linearly  dependent;  see  [3] . 


obtained  by 
manner  that 
A  are  nearly 


Table  1.1;  Subroutines  for  updating  a  QR  decomposition  A  =  QR 
to  yield  a  QR  decomposition  A  =  QR . 


The  BLAS  are  discussed  in  Section  3.  They  have  been  written  to 
vectorize  efficiently  on  a  IBM  3090-200VF  computer  using  the 
vectorizing  compiler  VS  FORTRAN  2.3.0  without  special  compiler 
directives.  Most  BLAS  were  obtained  by  modifying  LINPACK  BLAS  [6] . 

We  hope  that  the  provided  BLAS  vectorize  well  without  excessive 
timing  increases  also  on  other  vector  computers.  Section  4  contains 
output  from  a  driver  illustrating  the  use  of  the  subroutines.  A 
listing  of  the  source  code  of  the  driver  is  provided  in  the  Appendix. 
Section  4  also  contains  some  timing  results. 
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2.  THE  UPDATING  SUBROUTINES 

V'e  consider  the  subroutines  of  Table  1.1  in  order.  These 
subroutines  use  auxiliary  subroutines  which  we  need  to  introduce 
first.  They  are  listed  in  Table  2.1. 


Auxi 1 iary 
subrout i ne 

DORTHO 


DORTHX 


DINVIT 


DTRLSL 


DTRUSL 


Cal  led  by 
subrout i ne 

DINSC,  DRNKl 


DDELR 


DRRPM 


DINVIT 


DINVIT 


Purpose 


Compute  s;  =  Q^w,  v:  =  (I-QQ^)w 
with  reorthogonal i zat i on  for 
arbitrary  vector  w. 


Compute  s:  =  Q'^e: ,  v:  =  (I-QQ 
with  reorthogonal izat ion  for 
vector  Cj . 


ax  1  s 


Compute  approximation  of  a  right 
singular  vector  corresponding  to  a 
least  singular  value  of  R.  A  first 
approximation  is  obtained  from  the 
LINPACK  condition  number  estimator 
DTRCO ,  and  is  improved  by  inverse 
iterat ion . 


Solve  lower  triangular  system  of 
equations  with  frequent  rescalings  in 
order  to  avoid  ov'erflow.  Similar  to 
part  of  DTRCO. 

Solve  upper  triangular  linear  system 
of  equations  with  frequent  rescalings 
in  order  to  avoid  overflow.  Similar 
to  part  of  DTRCU . 


Table  2.1:  Auxiliary  subroutines. 


2.1  Subroutines  DORTHO  and  DORTHX 

Given  a  matrix  Q  €  ,  m  >  n,  w’ith  orthonormal  columns  and  a 

vector  w  £  R"^ ,  the  subroutine  DORTHO  computes  the  Fourier 
coefficients  s:  =  Q^w  and  the  orthogonal  projection  of  w  into  the 
null-space  of  Q  .  v:  =  (I-CJQ  )w.  At  most  one  reorthogonal  i  zat  i  on  is 
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carried  out.  Since  the  subroutine  DORTHO  differs  from  the 
corresponding  Algol  procedure  ’’orthogonal  ize"  in  [5]  we  discuss 
DORTHO  and  its  use  in  some  detail. 

Subroutine  DORTHO  is  called  by  routine  DIN’SC,  which  updates  the 
QR  factorization  of  a  matrix  A  =  QR  €  R’’’*" ,  m  >  n,  when  a  column  w  is 
inserted  into  A.  Updating  may  not  be  meaningful  if  w  is  nearly  a 
linear  combination  of  the  columns  of  Q.  Therefore  DORTHO  computes 
the  condition  number  of  the  matrix  Q:  =  [Q,w/||w||]  £  pmx(n+i)^  where  the 

norm  ||  ||  is  the  Euclidean  norm.  Using  Q^Q  =  I,  we  obtain  the 

following  expressions  for  the  singular  values  >  1^2  >  •  •  •  >  oT 

Q: 

=  (1  +  ||Q'"w11/|1w|1)1/2,  (2.1a) 

=  1,  2  <  j  <  n,  (2.1b) 

<Tn+i  =  (1  -  IIQ'^w||/||w||)'/^  (2.1c) 

Further,  for  v:  =  (  I -QQ^  )  w’/||w|| , 

IMI  =  <Ticrn+i.  (2.2) 

Since  1  <  <  i2,  CTn+i  is  also  an  accurate  estimate  of  the  length  of 

the  orthogonal  projection  of  w/||w||  into  the  null -space  of  .  In 
order  to  avoid  severe  cancellation  of  significant  digits  in  (2.1c)  we 
determine  first  (Tj  from  (2.1a)  and  then  from  (2.2). 

Subroutines  DINSC  and  DQRTHO  have  an  input  parameter  RCOND  which 
is  a  lower  bound  for  the  reciprocal  condition  number.  The 
computations  are  discontinued  and  an  error  flag  is  set  if 
RCOND  <  .  On  exit.  RCOND;  = 
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Assume  now  that  the  input  value  of  RCOND  >  Then  DORTHO 

computes  s:  =  Q^w  and  v:  =  (I-C)Q^)w  by  a  scheme  analogous  to  the 
method  described  by  Parlett  [15,  p.  107]  for  orthogonal i z i ng  a  vector 
against  another  vector.  For  definiteness,  we  present  the 
orthogonal  izat  ion  scheme.  References  to  ,  ffn-j-i »  and  RCOND  are 
neglected  for  simplicity. 

Orthogonal izat ion  algorithm;  input  Q  G  pmxn  has  orthonormal 

columns)  ,  m,n  (m  >  n)  ,  w  G  R"’  (w  ^  0)  ;  output  v  (v  =  (I-QQ^)w)  , 
s  (s  =  q'^'w)  ; 

w:  =  w/l|w||; 

s:  =  q''‘w;  v:  =  w-Qs;  (2.3) 

if  ||v||  >  0.707  then 

v:  =  v/l|vl|  ;  s:  =  s||w||  ;  exit;  *  l|vjl  =  1,  Q'^v  =  0  » 
s';  =  Q^v;  v' ;  =  v-Qs';  (2.4) 

if  ||v'||  <  0.707|iv||  then 

*  w  lies  in  span{Q}  numerically  » 

v:  =  0;  s:  =  (s+s')||w|(;  set  flag;  exit; 
v:  =  ( v+v') /||v+v'||  ;  s:  =  (s+s')||w||;  exit;  *  ||v||  =  1,  Q'^'v  =  0  * 

The  proof  in  Parlett  [15,  pp .  107-108]  that  one  reorthogonal i zat i on 

suffices  carries  over  to  the  present  algorithm,  using  that  Q^Q  =  I. 

We  note  that  there  are  other  ways  to  carry  out  the  computations 
on  lines  (2.3)-(2.4)  .  In  [5]  ,  v  and  v'  are  updated  immediately  after 
a  component  of  s  is  computed.  Our  scheme  has  the  advantages  of  being 
faster  on  vector  computers,  since  it  allows  matrix  vector  operations, 
and  it  is  also,  generally,  more  accurate,  since  rounding  errors 
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accumulate  less.  The  latter  can  easily  be  shown,  and  we  omit  the 
detai 1 s . 

We  turn  to  subroutine  DORTHX .  This  is  a  faster  version  of 
subroutine  DORTHO .  DORTHX  assumes  that  w  in  the  orthogonal i zat i on 
algorithm  is  an  axis  vector.  This  simplifies  the  computations  in 
(2.3).  DORTHX  may  perform  nearly  twice  as  fast  as  DORTHO. 

2.2  Subroutines  DINVIT,  DTRLSL  and  DTRUSL 

Given  a  nonsingular  upper  triangular  matrix  U  =  G  R"*"  and  a 

vector  b  =  [/3j]  €  R’’ ,  DTRUSL  solves  Ux  =  bp,  where  |p|  <1  is  a 
scaling  factor  such  that  \0jP/^^jj\  <  1  for  all  j.  The  scaling  factor 
is  introduced  in  order  to  avoid  overflow  when  solving  very  ill- 
conditioned  linear  systems  of  equations.  DTRLSL  is  an  analogous 
subroutine  for  lower  triangular  systems. 

DTRLSL  and  DTRUSL  are  called  by  DINVIT,  a  subroutine  for 
computing  an  approximation  of  a  right  singular  vector  belonging  to  a 
least  singular  value  of  a  right  triangular  matrix  R.  If  R  is 
singular  then  such  a  singular  vector  is  computed  by  solv'ing  a 
triangular  linear  system  of  equations.  Otherwise  an  initial 
approximate  right  singular  vector  a^°^  =  {aj°^}j‘_j  is  obtained  by  the 
LINPACK  condition  number  estimator  DTRCO ,  and  inverse  iteration  with 
R^R  is  used  to  obtain  improved  approximations  a^''^  j  =  1  ,2,  .  .  .  ,NMBIT, 
where  NMBIT  is  an  input  parameter  to  DINVIT  and  DRRPM .  On  exit  from 
DINVIT  and  DRRPM,  IPOS(j)  contains  the  least  index  k  such  that 
I  I  >  I  ^  I  NMBIT.  On  return  from  DINVIT  and 

DRRPM  the  parameter  DELTA  is  given  by  DELTA:  =  ||R'^Ra^^'^®''^^||/||a^'^'^^''^^||  . 
Hence,  DELTA  is  an  upper  bound  for  the  least  singular  value  of  R. 
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2.3  Updating  subroutines 

Ve  are  in  a  position  to  consider  the  subroutines  of  Table  1.1. 

The  vector i zat i on  is  mainly  done  in  the  BLAS  of  the  next  section,  but 
some  loops  of  the  subroutines  of  Table  1.1  vectorize  as  well. 

Comments  in  the  source  code  reveal  which  loops  vectorize  or  are 
eligible  for  vector  i  zat  i  on  on  an  IBM  3090-200V'F  computer  with 
compiler  VS  FORTRAN  2.3.0  to  where  the  default  vector izat ion 
directives  are  used.  For  applications  to  particular  problem  classes, 
changing  the  default  vector izat ion  by  compiler  directives  may 
decrease  the  execution  time. 

We  list  the  differences  between  the  subroutines  of  Table  1.1  and 
the  corresponding  Algol  procedures  of  [5] .  Some  of  these 
modifications  were  suggested  in  [5]  but  not  implemented  in  the  Algol 
procedures  [5] .  In  subroutine  DDELC .  the  column  deleted  in  A:  =  QR 
is  determined  optional  1>’.  Not  computing  this  column  saves  0(mn) 
arithmetic  operations.  In  subroutine  DDELR.  the  auxiliary,'  subroutine 
DORTHX  is  used  instead  of  DORTHO .  As  indicated  in  Section  2.1  the 
former  subroutine  may  perform  nearly  twice  as  fast.  In  subroutine 
DINSC,  a  column  w  is  inserted  into  A:  =  QR  only  if  the  reciprocal 
condition  number  of  the  matrix  [Q.w/]|w||]  is  larger  than  a  bound  given 
by  the  parameter  RCOND  on  entr\'.  The  parameter  RCOND  can  be  used  to 
prevent  updating  when  w/))w||  is  nearly  in  the  range  of  Q.  Finally. 
DRNKl  performs  slightly  faster  if  the  updated  matrix  A  +  vu^  is  such 
that  V  lies  numerically  in  the  range  of  A. 

The  subroutine  DRRPM  implements  an  algorithm  presented  by  Chan 
[3] .  The  computation  of  an  approximate  right  singular  vector 
corresponding  to  a  least  singular  value  is  done  by  subroutine  DIN\’IT 
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and  has  already  beer,  discussed.  The  position  of  a  component  of 
largest  magnitude  of  this  singular  vector  has  to  be  determined,  and 
we  found,  in  agreement  with  Chan’s  suggestion  [3] ,  that  two  inverse 
iterations  suffice.  In  fact,  in  all  computed  examples,  one  inverse 
iteration  was  sufficient,  even  for  problems  with  multiple  or  close 
least  singular  values.  The  subroutine  permutes  the  order  of  columns 
1  through  k  of  AH  where  k  is  an  input  parameter,  A  G  j^mxn  ^  m  >  n,  and 
n  is  a  permutation  matrix.  DRRPM  is  typically  called  with 
k  =  n , n- 1 , n-2 , . . .  until  no  further  permutation  is  made  or  until  the 
computed  upper  bound  DELTA  for  the  least  singular  value  of  the  matrix 
consisting  of  the  first  k  columns  of  All  is  not  small. 

The  subroutines  of  Table  1.1  do  neither  require  nor  produce  a 
factorization  with  nonnegative  diagonal  elements  of  the  upper 
triangular  matrix. 

3.  THE  BLAS 

Much  computational  experience  on  a  variety  of  computers  led 
Dongarra  and  Sorensen  [7]  to  conclude  that  nearly  optimal  performance 
of  numerical  linear  algebra  subroutines  can  be  achiev'ed  if  the  sub¬ 
routines  for  the  basic  matrix  and  vector  operations,  such  as  multi¬ 
plication,  addition  and  inner  product  computation,  are  written  to 
perform  well  on  vector  computers.  We  wanted  to  write  a  code  that 
performs  well  on  an  IBM  3090-200VF  computer,  and  that  would  not 
require  excessive  tuning  when  moved  to  other  (vector)  computers. 
Therefore  we  designed  the  code  to  vectorize  well  without  special 
compiler  instructions,  since  the  latter  would  be  machine  dependent. 
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A  feature  of  the  V'S  FORTftAN  2.3.0  compiler  is  that  unnecessary 
vector  loads  and  stores  are  avoided  by  introducing  a  temporary  scalar 
variable,  denoted  by  ACC  in  the  subroutine  DAPX  in  Example  3.1. 

During  execution  ACC  should  be  thought  of  as  a  vector  variable  stored 
in  a  vector  register.  Timings  for  DAPX  and  comparison  with  code  with 
explicitly  unrolled  loops  have  been  carried  out  by  Robert  and 

Squazzero  [16] .  These  timings  show  subroutine  DAPX  to  perform  better 
than  equivalent  subroutines  with  explicitly  unrolled  loops. 


Example  3.1. 

C 

C 

C 


C 

C 

C 


20 

10 


Subroutine  for  matrix  vector  multiplication. 
SUBROUT I NE  DAPX (A , LDA , M , N , X , Y) 

DAPX  COMPUTES  Y:=A»X. 

I NTEGER  LDA , M . N , I . J 

REAL*8  A(LDA,N) ,X(N) ,Y(M) ,ACC 

OUTER  LOOP  VECTORIZES. 

DO  10  1=1 .M 
ACC=0D0 
DO  20  J=1 .N 

ACC=ACC+A( I , J)*X(J) 

CONTINUE 
Y ( I ) =ACC 
CONTINUE 
RETURN 

END  D 


Temporary  scalar  v'ariables  have  also  been  used  in  others  of  the  17 
BLAS  used . 


4 .  COMPUTED  EXAMPLES 

Exajnple  4.1.  In  this  example  the  QR  decomposition  of  a  4  x  3  matrix 
A  is  updated.  The  use  of  all  subroutines  of  Table  1.1  is 
illustrated.  The  main  program  producing  this  output  is  listed  in  the 
Append i x . 
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Example  4 . 1 


Example  4.2.  Execution  times  for  subroutines  DDELCO  and  DRNKl  are 
compared  for  scalar  and  vector  arithmetic.  The  measured  cpu  times 
differed  somewhat  between  different  executions  of  the  same  code. 
Therefore  the  reported  times  are  rounded  to  one  significant  digit  and 
the  quotient  of  measured  cpu  times  are  rounded  to  the  nearest 
multiple  of  1/2. 

Table  4.1  shows  the  cpu  times  for  DDELCO.  This  routine  and  its 
subroutines  have  been  compiled  with  the  \'S  FORTRAN  2.3.0  compiler. 

The  times  for  vector  arithmetic  are  obtained  from  code  generated  with 
compiler  option  vlev  =  2,  which  makes  the  compiler  generate  code  that 
utilizes  the  vector  registers  and  arithmetic.  The  times  for  scalar 
arithmetic  are  obtained  from  code  generated  with  compiler  option  vlev 
=  0,  which  makes  the  compiler  generate  code  that  does  not  use  vector 
instructions.  Gi\'en  a  QR  decomposition  of  a  matrix  A  6  ,  Table 

4.1  shows  the  cpu  time  required  by  DDELCO  to  compute  the  QR 
decomposition  of  A  G  obtained  by  deleting  column  one  of  A. 


cpu  time  in  seconds 


m 

n 

scalar  arithmetic 

vector  ar 

10 

10 

410''’ 

4  •  1  O''* 

20 

10 

410''* 

410''* 

30 

10 

510''^ 

410''* 

50 

10 

610’'‘ 

410'^ 

75 

10 

810''‘ 

410''* 

128 

10 

1-10'^ 

510''* 

1024 

10 

7-10'^ 

210'^ 

1280 

10 

910'^ 

310'^ 

Table  4.1:  Timings  for  DDELCO 


scalar  time 
vector  time 
1 

1 

1 .5 

1 .5 

2 

2 

3.5 

3.5 
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Table  4.2  is  similar  "to  Table  4.1  and  contains  execution  times  for 


DRNKl .  The  reduction  in  execution  time  obtained  by  using  vector 
instructions  is  of  the  sajne  order  of  magnitude  for  the  other  updating 
routines,  too. 

cpu  time  in  seconds 


m 

n 

scalar  arithmetic 

vector  arithmetic 

scalar  time 
vector  time 

16 

12 

I-IO'^ 

I-IO'^ 

1 

32 

25 

4-10‘^ 

310'^ 

1 .5 

64 

50 

210'^ 

7-10‘^ 

2 

128 

100 

610'^ 

2-10'^ 

2.5 

1024 

100 

410'^ 

8- 10’^ 

4 . 5 

1250 

100 

510'^ 

910’^ 

5 

Table  4.2:  Timings  for  DRNKl 


0 


Example  4.3.  Execution  times  for  subroutines  written  by  Buckley  [2] 
and  those  of  Table  1.1  are  compared.  The  vectorized  and  scalar  codes 
were  generated  as  explained  in  Example  4.2.  We  found  that 
vector izat ion  of  the  subroutines  in  [2]  did  not  change  the  execution 
times  significantly,  generally  less  than  20%.  In  all  computed 
examples  the  vectorized  subroutines  in  [2]  required  at  least  twice  as 
much  execution  time  than  the  vectorized  subroutines  of  Table  1.1. 

For  certain  problems  our  vectorized  code  executed  up  to  95  times 
faster  than  the  vectorized  code  in  [2] .  For  scalar  code  the 
differences  in  execution  time  often  decreased  with  increasing  matrix 
size.  Tables  4  3-4.6  present  some  sample  timings. 
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time  for 
DDELR 


time  for  DELROW  [2] 
time  for  DDELR 


m 


time  for 
DELROW  [2] 


10 

3- 10'^ 

9-10"* 

3.510^ 

64 

7- 10'^ 

2-10'^ 

210^ 

128 

8-10“* 

310'^ 

3 

1024 

4-10'^ 

210'^ 

4 

Table 

4.3:  The  first  row 

of  A  =  QR  is  deleted 

.  Cpu  times  for 

vectorized  cod 

e  for  updating  Q  and 

R  p.re  given  in 

seconds;  n  =  10. 

m 

time  for 

n  DELC 

time  for 

DELCOL  [2] 

time  for  DELCOL  [2] 
time  for  DDELC 

1024 

10  llO'"* 

21  O''* 

2 

1024 

100  110"* 

710'^ 

7.510^ 

1280 

100  110"* 

910'^ 

9.510^ 

Table 

4.4:  The  last  column  of  A  =  QR  is  deleted.  Cpu  times  for 

vectorized  code  for  updating  Q  and 

R  are  given  in 

seconds.  DDELC  does  not  compute  the  last  column  of 

A,  i  .e .  ,  I FLAG 

=  0  on  entry. 

time  for 

DINSC 

_ .  _  t i me 

for  INSCOL  [2] 

m 

time  for  DINSC 

INSCOL  [2] 

64 

I-IO'^ 

210'^ 

2 

128 

1-10'^ 

310'^ 

2 

1024 

7- 10'^ 

2  10'^ 

2.5 

Table  4.5:  A  new  first  column  is  inserted  into  A  =  QR .  Cpu  times 
for  vectorized  code  for  updating  Q  and  R  are  given  in 
seconds;  n  =  10. 


Tables  4. 3-4. 5  present  timings  for  vectorized  code.  The  next  table 
shows  timings  for  scalar  code  for  the  same  updatings  as  in  Table  4.3. 
Table  4.6  shows  that,  without  vector i zat i on ,  DELROW  [2]  requires  50% 
more  cpu  time  than  DDELR  for  moderately  large  problems. 


time  for 


time  for 


tiir*  for  DELRPW  [2] 
ime  f  ^  DDELR 


m 

DDELR 

DELROW  [2] 

10 

210'^ 

610"* 

310^ 

64 

llO'^ 

210‘^ 

1  .5 

128 

2- 10'^ 

3- 10*^ 

1 .5 

1024 

I-IO’^ 

210*^ 

1 .5 

Table  4.6: 

The  first 

row  of  A  =  QR  is  deleted. 

Cpu  times 

scalar  code  for  updating  Q  and  R  are  given  in  seconds; 
n  =  10 . 
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